Bank Churn Prediction Project.

Contents:


Context:

Description.

Objective: Given a Bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months.

Context: Businesses like banks that provide service have to worry about the problem of 'Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on the improvement of service, keeping in mind these priorities. Data Description: The case study is from an open-source dataset from Kaggle. The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance, etc. Link to the Kaggle project site:https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling Data Dictionary: RowNumber: Row number. CustomerId: Unique identification key for different customers. Surname: Surname of the customer Credit Score: Credit score is a measure of an individual's ability to pay back the borrowed amount. It is the numerical representation of their creditworthiness. A credit score is a 3-digit number that falls in the range of 300-900, 900 being the highest. Geography: The country to which the customer belongs. Gender: The gender of the customer. Age: Age of the customer. Tenure: The period of time a customer has been associated with the bank. Balance: The account balance (the amount of money deposited in the bank account) of the customer. NumOfProducts: How many accounts, bank account affiliated products the person has. HasCrCard: Does the customer have a credit card through the bank? IsActiveMember: Subjective, but for the concept EstimatedSalary: Estimated salary of the customer. Exited: Did they leave the bank after all? Points Distribution: The points distribution for this case is as follows:

Tasks.

  1. Read the dataset
  2. Drop the columns which are unique for all users like IDs (5 points)
  3. Perform bivariate analysis and give your insights from the same (5 points)
  4. Distinguish the feature and target set and divide the data set into training and test sets (5 points)
  5. Normalize the train and test data (10points)
  6. Initialize & build the model. Identify the points of improvement and implement the same. (20)
  7. Predict the results using 0.5 as a threshold (10points)
  8. Print the Accuracy score and confusion matrix (5 points)

Environment and Algorithms techniques.

implimintation Stepes.

Importing Libraries

Reading and Review of the dataset.

Application Type.

The data file bank.csv contains 12 features about 10000 clients of the bank.

The features or variables are the following:

Observation.

There are two categorical features that need to be encoded,

Transposing index and columns.

EDA Discriptive Observations.

Observation.

Identifying Outliers with interquartile Range (IQR).

By calculating the differences between the 75th and 25th percentiles. it is represented by the formula IQR = Q3 - Q1. The lines of code below calculate and print the inequalities range for each of the variables in the dataset.

Observation.

normal distribution on the Credit score.

Observation.

A normal distribution and skewness on the right on the 'Age' distribution.

Observation.

Observation.

Customers with 3 to 4 products is a higher chance to churn also about 20.4% of the customers have churned the baseline here the model could be predicted that 20.4% of the current customers will churn. altho 20.4% is a small number, we need to make sure that the chosen model does predict with great accuracy this 20.4% is of interest to the bank to identify and win their loyalty with the bank as opposed to accurately predicting the customers that retained

note to mention 40 to 70 years old customers are higher chances to churn Customer with CreditScore less than 400 are higher chances to churn 40 to 70 years old customers are higher chances to churn Customer with Credit Score less than 400 are higher chances to churn.

Observation.

There is a few outliers on the 'Age' distribution, number of products,HasCrCard and Credit Score.

Checking for error, duplicates and missing vales.

Bivariate Analysis.

Data Preparation.

*The following line of codes to fill in null values in the dataset, there is some values in the feature are nulls will use the pandas fill() function to fill those values.

The following code to segregate input and output variables also to drop some featuers which are unique for all users.

Encoding Dataset Features.

ne Hot Encoding: Dataset contains two categorical variables’ GENDER’ and ‘GEOGRAPHY.’ Categorical values are of string datatype and are an essential feature in model training, so using one-hot encoding values are being transformed into numerical values.

DataSet Merging

Dataset Splitting Train and Test.

Normalize the train and test data.

Feature Scaling: This technique is performed to standardize the independent input features. also it is done to avoid the dominance of one feature over the other and it is useful in avoiding the prediction results.

Converting the Xtrain into TF.

Building ANN & Model Training.

Compiling the ANN

Fitting the ANN on Training Set.

Training [Forward pass and Backpropagation]

Observation.

As indicated above with 100th EPOCH loss is reduced and accuracy has increased it is a good indication of the model performance.

Classification Report and Confusion matrix Plotting:

Following line of code to rint a classification report to measure the quality of predictions from a classification algorithm. it will indicate how many predictions are True and how many are not. it will point the True positives, and False positives, True negatives and false negatives.

Prediction and Model Score/Accuracy.

Making Predictions

After trained the ANN model and now it is ready to check its capability on predicting future churn results with our test set.

Observation.

Based on the model prediction the first five customers will not leave the bank while the sixth might leave.

Evaluating the ANN Model.

Improving the ANN(Dropout)

Tuning The ANN Model using GridSearch.

GridSearch. GridSearch is the process of performing hyperparameter tuning in order to determine the optimal values for a given model. As mentioned above, the performance of a model significantly depends on the value of hyperparameters. Note that there is no way to know in advance the best values for hyperparameters so ideally, we need to try all possible values to know the optimal values. Doing this manually could take a considerable amount of time and resources and thus we use GridSearchCV to automate the tuning of hyperparameters.

Best Parameters & Best Accuracy.

Conclusion.

Customer churn is a major problem of customers leaving services/products subscription and switching to another service provider. Due to the direct effect on profit margins, businesses organizations now are looking to identify customers who are at risk of churning and retaining them to gain back their loyalty and cope with competition by personalized promotional offers and adding value on the services quality in order to retain them, they need to identify the customers as well as the reason why? of churning so accordingly they can provide the customers with the best-personalized offers and products that suit their needs, this case study is about to find a solution to solve such problem.The objective of this given case study was to predict if a Bank customer, will leave or not by building a neural network-based classifier that can determine whether they will leave or not in the next 6 months.

The initial observations about the problem bserations:

References & GitHub Link.

GitHub Link